30 research outputs found
Relevance-Redundancy Dominance: a threshold-free approach to filter-based feature selection
Feature selection is used to select a subset of relevant features in machine learning, and is vital for simplification, improving efficiency and reducing overfitting. In filter-based feature selection, a statistic such as correlation or entropy is computed between each feature and the target variable to evaluate feature relevance. A relevance threshold is typically used to limit the set of selected features, and features can also be removed based on redundancy (similarity to other features). Some methods are designed for use with a specific statistic or certain types of data. We present a new filter-based method called Relevance-Redundancy Dominance that applies to mixed data types, can use a wide variety of statistics, and does not require a threshold. Finally, we provide preliminary results, through extensive numerical experiments on public credit datasets
Constraint acquisition and the data collection bottleneck
The field of constraint acquisition (CA) aims to remove the “modelling bottleneck” by learning constraints from examples. However, it gives rise to a “data collection bottleneck” as humans must prepare a suitable (labelled) dataset. A recently published paper described an unsupervised CA method called MineAcq that can learn standard CA benchmarks. In this paper we summarise the results, and apply MineAcq to a new, noisy, unlabelled dataset that was not designed for CA
Robust constraint acquisition by sequential analysis
Modeling a combinatorial problem is a hard and error-prone task requiring expertise. Constraint acquisition methods can automate this process by learning constraints from examples of solutions and (usually) non-solutions. We describe a new statistical approach based on sequential analysis that is orders of magnitude faster than existing methods, and gives accurate results on popular benchmarks. It is also robust in the sense that it can learn constraints correctly even when the data contain many errors
An analytics-based heuristic decomposition of a bilevel multiple-follower cutting stock problem
This paper presents a new class of multiple-follower bilevel problems and a heuristic approach to solving them. In this new class of problems, the followers may be nonlinear, do not share constraints or variables, and are at most weakly constrained. This allows the leader variables to be partitioned among the followers. We show that current approaches for solving multiple-follower problems are unsuitable for our new class of problems and instead we propose a novel analytics-based heuristic decomposition approach. This approach uses Monte Carlo simulation and k-medoids clustering to reduce the bilevel problem to a single level, which can then be solved using integer programming techniques. The examples presented show that our approach produces better solutions and scales up better than the other approaches in the literature. Furthermore, for large problems, we combine our approach with the use of self-organising maps in place of k-medoids clustering, which significantly reduces the clustering times. Finally, we apply our approach to a real-life cutting stock problem. Here a forest harvesting problem is reformulated as a multiple-follower bilevel problem and solved using our approachThis publication has emanated from research conducted with the financial support of Science Foundation Ireland (SFI) under Grant Number SFI/12/RC/228
A Partial Taxonomy of Substitutability and Interchangeability
Substitutability, interchangeability and related concepts in Constraint
Programming were introduced approximately twenty years ago and have given rise
to considerable subsequent research. We survey this work, classify, and relate
the different concepts, and indicate directions for future work, in particular
with respect to making connections with research into symmetry breaking. This
paper is a condensed version of a larger work in progress.Comment: 18 pages, The 10th International Workshop on Symmetry in Constraint
Satisfaction Problems (SymCon'10
Bounding the search space of the Population Harvest Cutting Problem with Multiple Size Stock Selection
In this paper we deal with a variant of the Multiple Stock Size Cutting Stock Problem (MSSCSP) arising from population harvesting, in which some sets of large pieces of raw material (of different shapes) must be cut following certain patterns to meet customer demands of certain product types. The main extra difficulty of this variant of the MSSCSP lies in the fact that the available patterns are not known a priori. Instead, a given complex algorithm maps a vector of continuous variables called a values vector into a vector of total amounts of products, which we call a global products pattern. Modeling and solving this MSSCSP is not straightforward since the number of value vectors is infinite and the mapping algorithm consumes a significant amount of time, which precludes complete pattern enumeration. For this reason a representative sample of global products patterns must be selected. We propose an approach to bounding the search space of the values vector and an algorithm for performing an exhaustive sampling using such bounds. Our approach has been evaluated with real data provided by an industry partne
Generating difficult CNF instances in unexplored constrainedness regions
When creating benchmarks for satisfiability (SAT) solvers, we need Conjunctive Normal Form (CNF) instances that are easy to build but hard to solve. A recent development in the search for such methods has led to the Balanced SAT algorithm, which can create k-CNF instances with m clauses of high difficulty, for arbitrary k and m. In this article, we introduce the No-Triangle CNF algorithm, a CNF instance generator based on the cluster coefficient graph statistic. We empirically compare the two algorithms by fixing the arity and the number of variables, but varying the number of clauses. We find that the hardest instances produced by each method belong to different constrainedness regions. In the 3-CNF case, for example, hard No-Triangle CNF instances reside in the highly-constrained region (many clauses), while Balanced SAT instances obtained from the same parameters are easy to solve. This allows us to generate difficult instances where existing algorithms fail to do so
A grouping genetic algorithm for joint stratification and sample allocation designs
Finding the optimal stratification and sample size in univariate and multivariate sample design is hard when the population frame is large. There are alternative ways of modelling and solving this problem, and one of the most natural uses genetic algorithms (GA) combined with the Bethel-Chromy evaluation algorithm. The GA iteratively searches for the minimum sample size necessary to meet precision constraints in partitionings of atomic strata created by the Cartesian product of auxiliary variables. We point out a drawback with classical GAs when applied to the grouping problem, and propose a new GA approach using “grouping” genetic operators instead of traditional operators. Experiments show a significant improvement in solution quality for similar computational effort
Classifier-based constraint acquisition
Modeling a combinatorial problem is a hard and error-prone task requiring significant expertise. Constraint acquisition methods attempt to automate this process by learning constraints from examples of solutions and (usually) non-solutions. Active methods query an oracle while passive methods do not. We propose a known but not widely-used application of machine learning to constraint acquisition: training a classifier to discriminate between solutions and non-solutions, then deriving a constraint model from the trained classifier. We discuss a wide range of possible new acquisition methods with useful properties inherited from classifiers. We also show the potential of this approach using a Naive Bayes classifier, obtaining a new passive acquisition algorithm that is considerably faster than existing methods, scalable to large constraint sets, and robust under errors
Solving a hard Cutting Stock Problem by machine learning and optimisation
We are working with a company on a hard industrial optimisation problem: a version of the well-known Cutting Stock Problem in which a paper mill must cut rolls of paper following certain cutting patterns to meet customer demands. In our problem each roll to be cut may have a different size, the cutting patterns are semi-automated so that we have only indirect control over them via a list of continuous parameters called a request, and there are multiple mills each able to use only one request. We solve the problem using a combination of machine learning and optimisation techniques. First we approximate the distribution of cutting patterns via Monte Carlo simulation. Secondly we cover the distribution by applying a k-medoids algorithm. Thirdly we use the results to build an ILP model which is then solved